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H . Abstract 



Graphical models use graphs to compactly capture stochastic dependencies amongst a collection of 
random variables. Inference over graphical models corresponds to finding marginal probability distribu- 
tions given joint probability distributions. In general, this is computationally intractable, which has led to 
a quest for finding efficient approximate inference algorithms. We propose a framework for generalized 
inference over graphical models that can be used as a wrapper for improving the estimates of approximate 
inference algorithms. Instead of applying an inference algorithm to the original graph, we apply the 
inference algorithm to a block-graph, defined as a graph in which the nodes are non-overlapping clusters 



^o . of nodes from the original graph. This results in marginal estimates of a cluster of nodes, which we further 

>" 

marginalize to get the marginal estimates of each node. Our proposed block-graph construction algorithm 



O; 

graphs with longer cycles. We present extensive numerical simulations that illustrate our block-graph 



is simple, efficient, and motivated by the observation that approximate inference is more accurate on 



framework with a variety of inference algorithms (e.g., those in the libDAI software package). These 
simulations show the improvements provided by our framework. 
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Fig. 1. Alg is an inference algorithm that estimates marginal distributions given a graphical model. We propose a framework 
that generalizes Alg using block-graphs to improve the accuracy of the marginal estimates. 



I. Introduction 

A graphical model is a probability distribution denned on a graph such that each node represents a 
random variable (or multiple random variables), and edges in the graph represent conditional independen- 
cieqj. The underlying graph structure in a graphical model leads to a factorization of the joint probability 
distribution. Graphical models are used in many applications such as sensor networks, image processing, 
computer vision, bioinformatics, speech processing, social network analysis, and ecology CD-El, to name 
a few. Inference over graphical models corresponds to finding the marginal distribution p s {x s ) for each 
random variable given the joint probability distribution p(x). It is well known that inference over graphical 
models is computationally tractable for only a small class of graphical models (graphs with low treewidth 
Pfll), which has led to much work to derive efficient approximate inference algorithms. 

A. Summary of Contributions 

Our main contribution in this paper is a framework that can be used as a wrapper for improving 
the accuracy of approximate inference algorithms. Instead of applying an inference algorithm to the 
original graph, we apply the inference algorithm to a block-graph, defined as a graph in which the 
nodes are non-overlapping clusters of nodes from the original graph. This results in marginal estimates 
of a cluster of nodes of the original graph, which we further marginalize to get the marginal estimates 
of each node. Larger clusters, in general, lead to more accurate inference algorithms at the cost of 
increased computational complexity. Fig.[TJiilustrates our proposed block-graph framework for generalized 
inference. 



'in a graphical model over a graph G, for each edge (i,j) £ G, Xi is conditionally independent of Xj given X v \{ij}, 
where V indexes all the nodes in the graph. We can also say that for each (i,j) £ G, Xi is dependent on Xj given Xg, for 
all S such that S C V\{i,j}. 



The key component in our framework is to construct a block-graph. It has been empirically observed 
that approximate inference is more accurate on graphs with longer cycles (51. This motivates our proposed 
block-graph construction algorithm where we first find non-overlapping clusters such that the graph over 
the clusters is a tree. We refer to the resulting block-graph as a block-tree. The block-tree construction 
algorithm runs in linear time by using two passes of breadth-first search over the graph. The second step 
in our block-graph construction algorithm is to split large cluster^] in the block-tree. Using numerical 
simulations, we show how our proposed algorithm for splitting large clusters leads to superior inference 
algorithms when compared to an algorithm that randomly splits large clusters in the block-tree and an 
algorithm that uses graph partitioning to find non-overlapping clusters 0. 

As an example, consider applying our block-graph framework to belief propagation (BP) [7], which 
finds the marginal distribution at each node by iteratively passing messages between nodes. If the graph 
is a tree, BP computes the exact marginal distributions, however, for general graphs with cycles, BP only 
approximates the true marginal distribution. Our framework for inference (see Fig. Q]) generalizes BP 
so that message passing occurs between clusters of nodes, where the clusters are non-overlapping. The 
estimates of the marginal distribution at each node can be computed by marginalizing the approximate 
marginal distribution of each cluster. Our framework is not limited to BP and can be used as a wrapper for 
any inference algorithm on graphical models as we show in Section [V] Using numerical simulations, we 
show how our block-graph framework improves the marginal distribution estimates computed by current 
inference algorithms in the literature: BP Q, conditioned belief propagation (CBP) (8), loop corrected 
belief propagation (LC) O, ifTOl . tree-structured expectation propagation (TreeEP), iterative join-graph 
propagation (IJGP) IfTTl . and generalized belief propagation (GBP) irT2l - lfl"5l . 

B. Related Work 

There has been significant work in extending the BP algorithm of message passing between nodes to 
message passing between clusters. It is known that the true marginal distributions of a graphical model 
minimize the Gibbs free energy ifToll . In lfT2l . ifTTll . the authors show that the fixed points of the BP 
algorithm minimize the Bethe free energy, which is an approximation to the Gibbs free energy. This 
motivated the generalized belief propagation (GBP) algorithm that minimizes the Kikuchi free energy 
[I8j, a better approximation to the Gibbs free energy. In GBP, message passing is between clusters of 
nodes that are overlapping. A more general approach to GBP is proposed in 11141 using region graphs 

2 For discrete graphical models, the complexity of inference is exponential in the maximum cluster size, which is why using 
the block-tree directly for inference may not be computationally tractable if the maximum cluster size is large. 



and in lfl5l using the cluster variation method (CVM). References [19], [20] propose some guidelines 
for choosing clusters in GBP. Reference iTTTTl proposes a framework for GBP, called iterative join-graph 
propagation (IJGP), that first constructs a junction tree, a tree-structured representation of a graph with 
overlapping clusters, and then splits larger clusters in the junction tree to perform message passing over 
a set of overlapping clusters. 

In the original paper describing GBP [17], the authors give an example of how non-overlapping clusters 
can be used for GBP, since, when applying the block-graph framework to BP, the resulting inference 
algorithm (Bm-BP, where m is the cluster size) becomes a class of GBP algorithms where the set 
of overlapping clusters corresponds to cliques in the block-graph. Our numerical simulations identify 
cases where Bm-BP leads to superior marginal estimates when compared to a GBP algorithm that uses 
overlapping clusters. Moreover, since our framework can be applied to any inference algorithm, we show 
that the marginal estimates computed by GBP based algorithms can be improved by applying the GBP 
based algorithm to a block-graph. Our block-graph framework is not limited to generalizing BP and we 
show this in our numerical simulations where we generalize conditioned belief propagation (CBP) and 
loop corrected belief propagation (LC). Both these algorithms have been shown to empirically perform 
better than GBP for certain graphical models (H, iTTOl . In |6), ||2ll . the authors propose using graph 
partitioning algorithms for finding non-overlapping clusters for generalizing the mean field algorithm for 
inference [16]. Our numerical results show that our algorithm for finding non-overlapping clusters leads 
to superior marginal estimates. 

We note that our work differs from some other works on studying graphical models defined over graphs 
with non-overlapping clusters. For example, IT221 consider the problem of learning a Gaussian graphical 
model defined over some block-graph. Similar efforts have been made in ||23l for discrete valued graphical 
models. In E4l . the author analyzes properties of a graphical model defined on a block-tree. In all of the 
above works, the underlying graphical model is assumed to be block-structured. In our work, we assume 
a graphical model defined on an arbitrary graph and then find a representation of the graphical model 
on a block-graph to enable more accurate inference algorithms. 

This paper is motivated by our earlier work in studying tree structures for Markov random fields 
(MRFs) indexed over continuous indices |25l . In ||26l , we have shown that a natural tree-like represen- 
tation for such MRFs exists over non-overlapping hypersurfaces within the continuous index set. Using 
this representation, we derived extensions of the Kalman-Bucy filter E71 and the Rauch-Tung-Striebel 
smoother [28] to Gaussian MRFs indexed over continuous indices. 




Fig. 2. An example of a graphical model. The global Markov property states that xa is independent of xc given xb since all 
paths from A to C pass through B. 



C. Paper Organization 

Section [TT] reviews graphical models and the inference problem. Section [III] outlines our proposed 
algorithm for constructing block-trees, a tree-structured graph over non-overlapping clusters. Section [TV] 
presents our algorithm for splitting larger clusters in a block-tree to construct a block-graph. Section [V] 
outlines our block-graph framework for generalizing inference algorithms. Section [VTJ presents extensive 
numerical results evaluating our framework on various inference algorithms. Section |VlT| summarizes the 
paper and outlines some future research directions. 

II. Background: Graphical Models and Inference 

A graphical model is defined using a graph G = (V, E), where the nodes V = {1,2, . . . ,p} index a 
collection of random variables x = {x s G Q d : s € V} and the edges E C V x V encode statistical 
independencies Q, |29l . The set of edges can be directed, undirected, or both. Since directed graphical 
models, also known as Bayesian networks, can be mapped to undirected graphical models by moralizing 
the graph, in this paper, we only consider undirected graphical models, also known as Markov random 
fields or Markov networks. 

The edges in a graphical model imply Markov properties about the collection of random variables. 
The local Markov property states that x s is independent of {x r : r € V\{J\f(s) U s}} given xj^r s \, 
where A/"(s) is the set of neighbors of s. For example, in Fig. [2] X2 is independent of {£3, xq, x-j, x%, xg} 
given {21,24,25}. The global Markov property, which is equivalent to the local Markov property for 
non-degenerate probability distributions, states that, for a collection of disjoint nodes A, B, and C, if B 
separates A and C, xa is independent of xc given xb- An example of the sets A, B, and C is shown 
in Fig. [2] From the Hammer sley -Clifford theorem 11301 , the Markov property leads to a factorization of 



the joint probability distribution over cliques (fully connected subsets of nodes) in the graph, 

p(x 1 ,x 2 ,...,x p ) = — J| 4>c{xc) , (1) 

cec 

where C is the set of all cliques in the graph G = (V,E), ipc(xc) > are potential functions defined 
over cliques, and Z is the partition function, a normalization constant. 

Inference in graphical models corresponds to finding marginal distributions, say p s (x s ), given the 
probability distribution p(x) for x = {x\, . . . ,x p }. This problem is of extreme importance in many 
domains. A classical example is in estimation when we are given noisy observations y of x and we want 
to estimate the underlying random vector. To find the minimum mean square error (mmse) estimate, we 
need to marginalize the conditional probability distribution p(x|y) to find the marginals p s (x s \y). An 
algorithm for marginalizing p(x) can be used for marginalizing p(x|y). In general, exact inference is 
computationally intractable, however, there has been significant progress in deriving efficient approximate 
inference algorithms. The main contribution in this paper is the block-graph framework for generalizing 
inference algorithms (see Fig. Q]) so that the performance of approximate inference algorithms can be 
improved. 

III. Block-Trees: Finding Trees Over Non-overlapping Clusters 

Section IIII-AI outlines our algorithm for constructing block-trees. Section IIII-BI defines the notion of 
optimal block-trees by using connections between block-trees and junction trees. Section UlI-CI outlines 
greedy algorithms for finding optimal block- trees. 

A. Main Algorithm 

Definition 1 (Block-Graph and Block-Tree): For a graph G = (V,E), a block-graph Q = (V, £) is a 
graph over clusters of nodes in V such that each node in V is associated with only one cluster in V. In 
other words, the clusters in V are non-overlapping. If the edge set £ C V x V is tree-structured, we call 
the block-graph a block-tree. 

Algorithm Q] outlines our construction of a block-tree Q given an arbitrary graph G = (V, E). Without 
loss in generality, we assume that G is connected, i.e., there exists a path between all non adjacent nodes. 
The original graph G and a set of nodes V\ C V are the input. The output is the block-tree Q. We refer 
to V\ as the root cluster. The algorithm first finds an initial set of clusters and then splits these clusters 
to find the final block-tree. We explain the steps of the algorithm. 



Forward step: Find clusters Vy, V2, ■ ■ ■ , V r using breadth-first search (BFS) so that V2 = J\f(Vy), V3 = 
N {V2)\{Vi U V2}, . . . ,V r = N(Vr)\{V r -2 U V r -i} ■ These clusters serve as initial clusters for the block- 
tree. During the BFS step, split each cluster 14 into its connected components {V^ 1 , . . . , V™ k } using the 
subgraph G{Vk), which denotes the graph only over the nodes in Vk (Line 2). 

Backwards step: We now merge the clusters in each Vk to find the final block-tree. The key intuition in 
this step is that each cluster Vk should be connected to a single cluster in Vu-y. If this is not the case, we 
merge clusters in V&_i accordingly. Starting at V r = {V r x , V r 2 , . . . , V™ T }, for each V? ,j = 1, . . . , m r , 
find all clusters C(Vr) in V r -\ that are connected to Vr (Line 6). Combine all clusters in C(Vr) into a 
single cluster and update the clusters in V^_ 1 accordingly. Repeat the above steps for all the clusters in 
Vr-l,V r -2, ■■ ■ , V3. 

Algorithm 1: Constructing Block-Trees: BlockTree(G,Vi) 
Data: A graph G = (V, E) and a set of nodes Vy. 
Result: A block-tree Q = (C,€) 

1 Find successive neighbors of Vy to construct a sequence of r clusters Vy , V2 , ■ ■ ■ , V r such that 

v 2 = N{v x ), v 3 = N(V2)\{V! uv 2 },...,v r = N(y r )\{v^2 u v r ^}. 

2 {Vj}, . . . , V fc mfc } <— Find m^ connected components of 14 using subgraph G(Vk). 

3 for i = r.r — 1, ... 3 do 



for j = 1,2, ...ra{ do 

C(yl) «- M(V/) n V-_i ; All nodes in Vi-i connected to V/ . 
Combine C(V/) into one cluster and update V£_i. 



8 £f- edges between all the clusters in V 



The first part of Algorithm [Q finds successive non-overlapping neighbors of the root cluster. This leads 
to an initial estimate of the block-tree graph. In the backwards step, we split clusters to form a block-tree. 
We illustrate Algorithm [T] with examples. 

Example: Consider the grid graph of Fig. [3£a). Choosing V\ = {1}, we get the initial estimates of the 
clusters as shown in Fig. Oa). Running the backwards step to identify the final clusters (see Fig. [2b)), 
we get the block-tree in Fig. [He). 

Example: In the previous example, the initial estimates of the clusters matched the final estimates and 
the final block-tree was a chain-structured graph. We now consider an example where the final block-tree 
will in fact be tree-structured. Consider the partial grid graph of Fig. Ha). Choosing V\ = {7}, we get 
the initial estimates of the clusters in Fig. St a). We now run the backwards step of the algorithm. Since 
V5 = {3} is connected to 2 and 6, C(Vs) = {2,6}. Thus, {2,6} become a single cluster. We now find 
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Fig. 3. (a) Original estimates of the clusters in a grid graph when running the forward pass of Algorithm \T\ (b) 
The final clusters after running the backwards pass of Algorithm Q] (c) Final block-tree. 
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Fig. 4. (a) Original estimates of the clusters in a partial grid when running the forward pass of Algorithm Q] (b) 
The final clusters after running the backwards pass of Algorithm Q] (c) Final block-tree. 



neighbors of {2, 6} in V3 = {9, 5, 1}. It is clear that only {9, 5} are connected to {2, 6}, so {9, 5} become 
a single cluster. In this way, we have split V3 into two clusters: V^ 1 = {9, 5} and V% = {1}. Continuing 
the algorithm, we find the remaining clusters as shown in Fig. @Ib). The final block-tree is shown in 
Fig. He). 

The following proposition characterizes the time complexity and correctness of Algorithm [Q 
Proposition 1: Algorithm Q] runs in time 0([I2|) and always outputs a block-tree. 

Proof: Both the forward step and the backwards step involve a breadth first search, which has 

complexity 0(\E\). Algorithm Q] always outputs a block-tree since each cluster in Vk is only connected 

to a single cluster in V^-i- ■ 

Block-trees are closely related to junction trees. In the next Section, we explore this connection to 

define optimal block-trees. 



B. Optimal Block-Trees 

Junction trees, also known as clique trees or join trees, are tree-structured representations of graphs 
using a set of overlapping clusters ||3T| , ||32l . The width of a graph is the maximum cardinality of a 





Fig. 5. (a) Junction tree for the block-tree in Fig.[3jc) (b) Junction tree for the block-tree in Fig. |4fc) 



cluster in the junction tree minus one. The treewidth of a graph is the minimum width of a graph over 
all possible junction tree representations. It is well known that several graph related problems that are 
computationally intractable in general can be solved efficiently when the graph has low treewidth. For 
the problem of inference over graphical models, [4] showed how junction trees can be used for exact 
inference over graphical models defined over graphs with small treewidth. 

Given a block-tree Q = (V, £), an equivalent junction tree representation can be easily computed 
by combining all clusters connected along edges into a single cluster. For example, the junction tree 
representation for the block-trees in Fig. [3jc) and Fig.Ufc) are given in Fig. [51 a) and Fig.[5j4>), respectively. 
Using this junction tree, we can derive exact inference algorithms for graphical models parameterized by 
block-trees. 

It is easy to see that the complexity of inference using block-trees will depend on the maximum 
sum of cluster sizes of adjacent clusters in the block-tree (since this will correspond to the width of 
the equivalent junction tree). Thus, an optimal block-tree can be defined as a block-tree Q = (V, £ ) 
for which max^-^ (\Vi\ + \Vj\) is minimized. From Algorithm [TJ the construction of the block-tree 
depends on the choice of the root cluster V\. Thus, finding an optimal block-tree is equivalent to finding 
an optimal root cluster. This problem is computationally intractable since we need to search over all 
possible combinations of root clusters V\. 

As an example illustrating how the choice of V\ alters the block-tree, consider finding a block-tree 
for the partial grid in Fig. [Ja)- In Fig. HJc), we constructed a block-tree using V\ = {7} as the root 
cluster. The maximum sum of adjacent cluster sizes in Fig. UJc) is four. Instead of choosing V\ = {7}, 
let V\ = {7, 4}. The initial estimate of the clusters are shown in Fig. [3a). The final block-tree is shown 
in Fig. [6t c )- Since the clusters {9, 6, 2} and {8, 5} are adjacent, the maximum sum of adjacent clusters 
is five. 
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(a) 

Fig. 6. (a) Original estimates of the clusters in the partial grid using Vi — {7, 4} as the root cluster, (b) Splitting of clusters, 
(c) Final block-tree. 



C. Greedy Algorithms for Finding Optimal Block-Trees 

In the previous Section, we saw that finding optimal block-trees is computationally intractable. In this 
Section, we propose three greedy algorithms for finding optimal block-trees that have varying degrees of 
computational complexity. 

Minimal degree node - M in Degree: In this approach, which we call M in Degree, we find the node with 
minimal degree and use that node as the root cluster. The intuition behind this is that the minimal degree 
node may lead to the smallest number of nodes being added in the clusters. The complexity of this 
approach is 0(n), where n is the number of nodes in the graph. 

The next two algorithms are based on the relationship between junction trees and block-trees outlined 
in Section ITlI-B I Recall that for every block-tree, we can find a junction tree. This means that an optimal 
junction tree (a junction tree with minimal width) may be used to find an approximate optimal block- 
tree. Further, finding optimal junction trees corresponds to finding optimal elimination orders in graphs 
ll33l . Thus, we can make use of greedy algorithms for finding optimal elimination orders to find optimal 
block-trees. 

Using an elimination order - GreedyDegree: One of the simplest algorithms for finding an approximate 
optimal elimination order is known as GreedyDegree 1341 , ||35l , where the elimination order corresponds 
to the sorted list of nodes in increasing degree. The complexity of GreedyDegree is 0(n log n) since we 
just need to sort the nodes. Using the elimination order, we triangulatqj the graph to find the cliques. 
These cliques correspond to the set of clusters in the junction tree representation. We search over a 
constant number of cliques to find an optimal root cluster. 
Using an elimination order - GreedyFillin: Another popular greedy algorithm is to find an optimal 



A graph is triangulated if each cycle of length four or more has an edge connecting non adjacent nodes. 
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Fig. 7. Plot showing the performance of three different greedy heuristics for finding optimal block-trees. 

elimination order such that at each step in the triangulation algorithm, we choose a node that adds a 
minimal number of extra edges in the graph. This is known as GreedyFillin ll36l and has polynomial 
complexity. Thus, GreedyFillin is in general slower than GreedyDegree, but does lead to slightly better 
elimination orders on average. To find the block-tree, we again search over a constant number of cliques 
over the triangulated graph. 

We now evaluate the three different greedy algorithms, MinDegree, GreedyDegree, and GreedyFillin, 
for finding optimal block-trees in Fig. |7] To do this, we create clusters of size k such that the total number 
of nodes is n (one cluster may have less than k nodes). We then form a tree over the clusters and associate 
a clique between two clusters connected to each other. We then remove a certain fraction of edges over 
the graph (not the block-tree), but make sure that the graph is still connected. By construction, the width 
of the graph constructed is at most 2k. Fig. |7] shows the performance of MinDegree, GreedyDegree, and 
GreedyFillin over graphs with different number of nodes and different values of k. We clearly see that 
both GreedyDegree and GreedyFillin compute widths that are close to optimal. The main idea in this 
Section is that we can use various known algorithms for finding optimal junction trees to find optimal 
block- trees. 



D. Exact Inference Using Block-Trees 

In the literature, exact inference over graphical models using non-overlapping clusters is referred to 
as the Pearl's clustering algorithm Q. In 11371 and [38], the authors use non-overlapping clustering for 
some particular directed graphical models for an application in medical diagnostics. For lattices, J39ll - ll4ll 
derive inference algorithms by scanning the lattice horizontally (or vertically). Our block-tree construction 
algorithm provides a principled way of finding non-overlapping clusters over arbitrary graphs. 
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Inference over graphical models denned on block-trees can be done by extending the belief propagation 
(BP) algorithm 0. The computational complexity of BP will depend on max(jj) e £ (\Vi\ + |Vj|). On the 
other hand, the computational complexity of exact inference using other frameworks that use overlapping 
clusters depends on the width of the graph |@), Il42l - ll44l . which is in general less than or equal to 
maxjj j)ef (|V^| + \Vj ; |). The main advantage of using block-trees for exact inference is that the complexity 
of constructing block-trees is 0(\E\), whereas the complexity of constructing tree-decompositions for 
inference using frameworks that use overlapping clusters is worse than 0(|1£|) 11331 . PRl . P31 . Thus, 
block-trees are suitable for exact inference over time-varying graphical models |46ll , |47l such that the 
clusters in the block-tree are small. 

IV. Block-Graph: Splitting Clusters in a Block-Tree 

In this Section, we outline a greedy algorithm for splitting large clusters in a block-tree to form a 
block-graph. This is an important step in our proposed framework (see Fig. [Q) for generalizing inference 
algorithms since we apply the inference algorithm to the block-graph as opposed to the original graph. 
Note that we can use the block-tree itself for inference; however, for many graphs this is computationally 
intractable since the complexity of inference for discrete graphical models using a block-tree is exponential 
in max(jj) e £(|Vi| + \Vj\). Thus, when the size of one cluster in the block-tree is large, exact inference 
using block-trees will be computationally intractably. 

We modify Algorithm Q] for constructing block-trees to construct block-graphs such that all clusters 
have cardinality at most m. 
Step 1. Using an initial cluster of nodes V\, find clusters V\, V 2 , ■ ■ ■ , V r using breadth-first search (BFS) 

such that V 2 = Af(Vi),V 3 = Af(V 2 )\{V t U V 2 }, . . . ,V r = N{V r )\{V r „ 2 U V r -i}. While doing 

the BFS, write V& as the set of all connected components in the subgraph G(Vk). Thus, Vk is a 

set of clusters. 
Step 2. For V r , if there exists any cluster that has cardinality greater than m, partition those components. 

Let V r = {V?,V?, ..., F r m "} be the final set of clusters. 
Step 3. Perform the next steps for each k = r — 1, r — 2, . . . , 1, starting at k = r — 1. Let Vk be the set of 

all clusters Vk that have cardinality greater than m. Partition all clusters in V& into appropriate 

size clusters of size at most 771. 

4 Our algorithm for finding optimal block-trees uses the junction tree construction algorithm, so even if max(j j3 -) g g(|Vi| + |Vj|) 
is large and the treewidth of the graph is small, we can detect this and use junction trees for inference. 
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Fig. 8. Explaining Step 4 in the block-graph construction algorithm. Given the block-graph in (a), if we merge nodes 2 and 3, 
we get the block-graph in (b). If we merge nodes 3 and 4, we get the block-graph in (c). The block-graph in (c) has just one 
loop. 





(a) (b) 

Fig. 9. (a). Original graph, (b) Block-graph representation of the graph in (a). 



Step 4. Merge the clusters in the set Vk\Vk- The idea used in merging clusters is that if two clusters are 

connected to the same cluster in Vk+i, by merging these two clusters, we reduce one edge in the 

final block-graph. Further, if two clusters in Vf. are not connected to the same cluster in Vk+i, we 

do not merge these two clusters, since the number of edges in the final block-graph will remain 

the same. The final clusters constructed using the above rules is denoted as V& = {Vu, ■ ■ ■ , V™ k }. 

Step 5. The block-graph is given by the clusters V = {V^,V^, • • • , V™ k }k=i ,...,r an d the set of edges £ 

between clusters. 

The key step in the above algorithm is Step 4, where we cluster nodes appropriately. Fig. [8] explains 

the intuition behind merging clusters with an example. Suppose, we use the block-graph construction 

algorithm up to Step 3 and now we want to merge clusters in V2 = {2, 3, 4}. If we ignore Step 4 and 

merge clusters randomly, we might get the block-graph in Fig. [8jb) on merging nodes 2 and 3. If we use 

Step 4, since nodes 3 and 4 are connected to the same node, we merge these to get the block-graph in 

Fig. [8]c). The graph in Fig. [SJc) has a single cycle with five edges, whereas the graph in Fig. Ob) has 

two cycles of size four and three. It has been observed that inference over graphs with longer cycles is 

more accurate than inference over graphs with shorter cycles |5]. Thus, our proposed algorithm leads to 

block-graphs that are favorable for inference. 
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V. Inference Using Block-Graphs 

Define a graphical model on a graph G = (V,E) using a collection of p random variables x = 
(x\, . . . ,x p ), where each Xk takes values in £l d , where d > 1. Let Q = (V, £) be a block-graph 
representation of the graph G = (V,E). To derive inference algorithms over the block-graph, we need 
to define appropriate potentials (or factors) associated with each clique in the block-graph. This can be 
done by mapping the potentials from the original graph to the block-graph. As an example, let G be the 
graph in Fig. Ha) and let the probability distribution over G be given by 

P(x) = -^1pl,2(xi, X 2 )lpl,i(xi, X 4 ,)lp 1<3 (x 1 , X 3 )lp 2 ,4,,5(x2, Xi, 2:5)^4,5,6,7(2:4, X 5 , Xq, X 7 )lp 3fi (x 3 , Xq) . (2) 

Let the clusters in the block-graph representation of G in Fig. |9]4>) be V\ = {1, 2}, V2 = {4, 5}, V3 = {3}, 
and V4 = {6, 7}. The probability distribution in © can be written in terms of the block-graph as follows: 

p(x) = y^iA x v 1 ,xv 2 )^i,3(xv 1 ,xv 3 )^2A(,xv 2 ,xv i ) 1 $>2A x V2,xvJ'4 , 3A x v 3 ,xvJ , (3) 

where 

^l,2( X Vi,Xv 2 ) = 1pl,2(x 1 ,X2)lpl,i(x 1 , :E4)^2,4,5 (^2, XA, X 5 ) (4) 

9 1)3 (xv 1 ,xv s )=ipi,3{xi,x 3 ) (5) 

^2,4,( x V 2 , x V i ) =^4,5,6,7(^4,^5, ^6,^7) (6) 

^2,4,( X V 3 , x Vi) = ^3,6(2:3,2:6) • ( 7 ) 

Let Alg be an algorithm for inference over graphical models. Inference over the graph G can be 
performed using Alg with inputs being the potentials in ©. Inference over the block-graph can be 
performed using Alg with input being the potentials in d!])-©. To get the marginal distributions from the 
block-graph, we need to further marginalize the joint probability distribution over each cluster. 

Remark 1: Both the representations © and © are equivalent, so we are not making any approxima- 
tions when parameterizing the graphical model using block-graphs. 

Remark 2: There is a trade-off in choosing the size of the clusters in the block-graph. Generally, 
as observed in our numerical simulations, larger clusters lead to better estimates at the cost of more 
computations. 

Remark 3: We presented the block-graph framework using undirected graphical models. The results 
can be easily generalized to settings where the probability distribution is represented as a factor graph 
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VI. Numerical Simulations 

In this Section, we provide numerical simulations to show how our proposed block-graph framework 
for generalizing inference algorithms can be used to improve the performance of current approximate 
inference algorithms that have been proposed in the literature. Throughout this Section, we assume 
x s G {—1,4-1} and the probability distribution over x factorizes as 

1 p 

pi^) = ^Y\_M x t) n ipij(xi,xj). (8) 

*=i (i,j)eE 

The node potentials are given by 4>i(xi) = exp(— aixi), where Oj ~ A/"(0, 0.1) and the edge potentials 
are given by 

Repulsive (REP): i/)ij(xi,Xj) = exp(—\bij\xiXj) (9) 

Attractive (ATT): tfjij(xi,Xj) = exp(\bij\xiXj) (10) 

Mixed (MIX): ij>ij(xi,Xj) = exp(-bijXiXj) , (11) 

where by ~ M(0, a) and a is the interaction strength. For distributions with attractive (repulsive) poten- 
tials, neighboring random variables are more likely to take the same (opposite) value. For distributions with 
mixed potentials, some neighbors are attractive, whereas some are repulsive. We study several approximate 
inference algorithms that have been proposed in the literature: Belief Propagation (BP) |7], Iterative Join- 
Graph Propagation (IJGP-i) O, Generalized Belief Propagation (GBP-z) (H, OH. Conditioned Belief 
Propagation (CBP-Z) [8], Loop Corrected Belief Propagation (LC) iflOll . Tree-Structured Expectation 
Propagation (TreeEP) [49]. In IJGP-i and GBP-i, the integer i refers to the maximum size of the clusters, 
where the clusters in these algorithms are overlapping. The clusters in GBP-i are selected by finding 
cycles of length i in the graph. In CBP-Z, I is an integer that refers to the number of clamped variables 
when performing inference: larger I in general leads to more accurate marginal estimates. We use the 
libDAI software package BUI for all the inference algorithms except for IJGP, where we use the software 
provided by the authors at |[5ll . For an inference algorithm Alg, we refer to the generalized inference 
algorithm as Bm-Alg, where the m is an integer denoting the maximum size of the cluster in the block- 
graph. 

We consider two types of graphs: (i) grid graphs and (ii) random regular graphs, where each node 
in the graph has the same degree and the edges are chosen randomly. Both these graphs have been 
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used extensively in the literature for evaluating inference algorithms (H, iflOl . We compare inference 
algorithms using the mean absolute error: 

v 

P 



1 
Error== -^ Y^ \Ps(x s ) - p s (x s )\ , (12) 

«=1 x B £{-!,+!} 



where p s is the marginal estimate computed by an approximate inference algorithm and p s (x s ) is the true 
marginal distribution. To evaluate the computational complexity, we measure the time taken in running the 
inference algorithms on a 2.66GHz Intel(R) Xeon(R) X5355 processor with 32 GB memory. Since all the 
approximate inference algorithms we considered are iterative algorithms, we set the maximum number of 
iterations to be 1000 and stopped the inference algorithm when the mean absolute difference between the 
new and old marginal estimates is less than 10 -9 . All the code and graphical models used in the numerical 



simulations can be downloaded from http://www.ima.umn.edu/~dvats/GeneralizedInference.html 



A. Evaluating the Block-Graph Construction Algorithm 

We first evaluate our proposed algorithm for constructing block-graphs (see Section HVT ) where we 
split large clusters in a block-tree. Fig. [10] shows the results of applying our block-graph framework to 
generalize BP, CBP, LC, and TreeEP on a 5 x 5 grid graph. We compare our algorithm to an algorithm that 
randomly splits the clusters in a block-tree and an algorithm proposed in |6] that uses graph partitioning 
to find non-overlapping clusters. In Fig. [iJJ the solid lines correspond to our algorithm (see legend B2- 
BP), the dashed lines correspond to random splitting of clusters (see legend RandB2-BP), and the dotted 
lines correspond to graph partitioning (see legend GP-B2-BP). The results reported are averages over 100 
trials. 

Remark 4: It is clear that the graph partitioning approach performs the worst amongst the three different 
algorithms (the dotted line is above the solid and dashed line). For TreeEP, we observe that the graph 
partitioning approach performs worse than the original algorithm that does not use block-graphs. This 
suggests that the graph partitioning algorithm in [6 ] is not suitable for the inference algorithms considered 
in Fig. [10] We did not apply the graph partitioning algorithm to LC since the corresponding inference 
algorithm was very slow. 

Remark 5: In most cases, our proposed algorithm for constructing block-graphs performs better than 
using an algorithm that randomly splits clusters (the solid line is below the dashed line). Interestingly, for 
TreeEP, the random algorithm performs worse than the original algorithm. We also observed that both the 
random algorithm and the graph partitioning algorithm took more time than our proposed algorithm. This 
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Fig. 10. Evaluating the block-graph construction algorithm in Section IIVI on a 5 x 5 grid graph. The solid lines, denoted 
by Bm-Alg for an inference algorithm Alg, correspond to using our proposed block-graph construction algorithm. The dashed 
lines correspond to an inference algorithm that randomly splits larger clusters in a block-tree. The dotted lines correspond to 
an inference algorithm that uses graph partitioning to find clusters. The plots in the top, middle, and bottom row correspond to 
repulsive, attractive, and mixed potentials, respectively. 



suggests that our proposed block-graph construction algorithm leads to block-graphs that are favorable 
for inference. 



B. Grid Graphs 

Tables U [Ell [Till EJ and [V] show results of applying the block-graph framework for inference over 
graphical models defined on grid graphs. 

Remark 6: In general, we observe that for all cases considered, applying the block-graph framework 
leads to better marginal estimates. This is shown in the Tables, where for each inference algorithm, we 
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Fig. 11. Comparing the performance of BP, IJGP, and GBP. Algorithms with the same message passing complexity are plotted 
in the same color. 



highlight the algorithm leading to the smallest mean error in bold. For example, when using block-graphs 
of size two for BP, the error decreases by as much as 25% (see BP vs. B2-BP), whereas when using 
block-graphs of size three for BP, the error decreases by as much as 50% (see BP vs. B3-BP). 

Remark 7: It is interesting to compare BP, IJGP, and GBP, where both IJGP and GBP are based 
on finding overlapping clusters and IJGP first constructs a junction tree and then splits clusters in the 
junction tree to find overlapping clusters. Note that for the class of graphical models considered in ([8]>, 
Bm-BP belongs to the class of GBP-2m algorithms since we can map the block-graph into an equivalent 
graph with overlapping clusters as done so when converting a block-tree into a junction tree (see Fig. [5]). 
Further, IJBP-2m is also a GBP-2m algorithm [11]. Thus, we want to compare Bm-BP, IJGP-2m, and 
BP-2m. It is clear that GBP-2m leads to superior marginal estimates, however, this comes at the cost of 
significantly more computations. Fig. [TT] compares Bm-BP to IJGP-2m. We observe that for many cases, 
Bm-BP leads to better marginal estimates than IJGP-2m. We note that comparing Bm-BP to IJGP-2m 
may not be appropriate since the stopping criteria for the IJGP may be different than that of the BP 
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algorithm^ 



Remark 8: We can apply the block-graph framework to generalize GBP based algorithms. Our results 
show that this leads to better marginal estimates, see GBP-4 vs. B2-GBP-4, IJGP-3 vs. B2-IJGP-3, and 
IJGP-4 vs. B2-IJGP-4. More specifically, looking at Table [V] we notice that the performance of using 
block-graphs of size two on GBP results in the error being reduced to nearly 15% of the original error. 

Remark 9: In Fig.[TQ we see that for many cases the performance of B2-IJGP-3 (B2-IJGP-4) is better 
than IJGP-6 (IJGP-8). This suggests that the set of overlapping clusters chosen using the block-graph 
framework may be better than the clusters chosen using the IJGP framework. 

Remark 10: Overall, we observe that block-graph versions of TreeEP lead to the best estimates with 
reasonable computational time. For example, in Table [TV] with a = 1, B2-TreeEP results in a mean 
error of 0.1583 running in an average of 0.455 seconds. In comparison, GBP-4 takes an average of 
about 95 seconds and the mean error is 0.0884. When compared to other algorithms, B3-CBP-2 runs in 
0.32 seconds and results in a mean error of 0.2657. For mixed potentials in Table [HI] we observe that 
the generalized versions of TreeEP do not lead to significant improvements in the marginal estimates 
although the performance of other algorithms does improve. Reference IJ20l proposes a generalization of 
TreeEP and gives guidelines for choosing clusters in the GBP algorithm. As shown for IJGP and GBP, 
our framework can be used in conjunction with frameworks that use overlapping clusters. 

Remark 11: To our knowledge, there have been no algorithms for generalizing LC and CBP-Z. The 
computational complexity of LC is exponential in the maximum degree of the graph [10], so it is only 
feasible to apply LC to a limited number of graphs. We only used LC for the 5x5 grid graph example 
in Fig. [TO] We observe that the CBP-/ algorithm improves the estimates of the BP algorithm. Moreover, 
for regimes where the interaction strength is small, the performance of generalized versions of CBP is 
comparable to that of TreeEP. For example, in Table [TV] for a = 0.5, the best TreeEP algorithm has 
a mean error of 0.0694 and the best CBP based algorithm has a mean error of 0.0864. As another 
example, in Table [V] for a = 0.5, the best TreeEP algorithm has a mean error of 0.0624 and the best 
CBP algorithm has a mean error of 0.0608. 

Remark 12: Fig. [12] shows how the error scales as the size of the cluster in the block-graph increases 
for the 20 x 20 grid graph. It is clear that the error in general decreases as the cluster size increases; 
however, for some cases, the error does seem to increase especially when the interaction strength is large. 

5 For IJGP, we used the software available at (5TJ. We could specify the maximum number of iterations, but not the stopping 
criteria. 
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TABLE I 

10 x 10 Grid Graph with Repulsive Potentials: 30 Trials 



Algorithm 


a - 
Error 


= 0.5 
Time (s) 


a 


= 1 


a - 


= 1.5 


a - 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 

B2-BP 

B3-BP 


0.2122 
0.1405 
0.1065 


0.0457 
0.0213 
0.0193 


0.3714 
0.3080 
0.2509 


0.0237 
0.0073 
0.0133 


0.4773 
0.3379 
0.3019 


0.0150 
0.0057 
0.0033 


0.4220 
0.2978 
0.3565 


0.0120 
0.0037 
0.0020 



IJGP-3 


0.1864 


- 


0.3784 


- 


0.4560 


- 


0.4254 


- 


B2-IJGP-3 


0.1073 


- 


0.2827 


- 


0.3789 


- 


0.3044 


- 


IJGP-6 


0.1441 


- 


0.3442 


- 


0.2761 


- 


0.4232 


- 


IJGP-4 


0.1856 


- 


0.4300 


- 


0.4128 


- 


0.4230 


- 


B2-IJGP-4 


0.0997 


- 


0.2394 


- 


0.2873 


- 


0.3245 


- 


IJGP-8 


0.1038 


- 


0.2349 


- 


0.2407 


- 


0.4088 


- 


CBP-2 


0.1345 


0.2507 


0.2757 


0.1710 


0.4405 


0.1240 


0.3801 


0.1137 


B2-CBP-2 


0.0740 


0.1687 


0.2109 


0.1147 


0.3263 


0.0870 


0.2871 


0.0660 


B3-CBP-2 


0.0490 


0.1583 


0.1872 


0.1213 


0.2701 


0.0890 


0.3756 


0.0767 


CBP-3 


0.1056 


0.5223 


0.2224 


0.3537 


0.4176 


0.2603 


0.3459 


0.2313 


B2-CBP-3 


0.0561 


0.3470 


0.1726 


0.2483 


0.2824 


0.1863 


0.2752 


0.1470 


CBP-4 


0.0866 


1.0303 


0.2094 


0.6913 


0.3420 


0.5337 


0.3195 


0.4650 


B2-CBP-4 


0.0437 


0.6810 


0.1229 


0.5067 


0.2438 


0.3800 


0.2744 


0.3123 



TreeEP 


0.0678 


0.1993 


0.1499 


0.2513 


0.1475 


0.1680 


0.1067 


0.1630 


B2-TreeEP 


0.0547 


0.1110 


0.1273 


0.1343 


0.0489 


0.1400 


0.0551 


0.1120 


B3-TreeEP 


0.0542 


0.1463 


0.0878 


0.1683 


0.0485 


0.1447 


0.0414 


0.1527 


GBP-4 


0.0110 


21.0497 


0.0439 


28.8153 


0.0532 


25.7290 


0.0379 


25.5437 


B2-GBP-4 


0.0005 


16.7593 


0.0026 


25.5223 


0.0021 


28.8153 


0.0018 


33.1343 



TABLE II 
10 x 10 Grid Graph with Attractive Potentials: 30 Trials 



IJGP-3 

B2-IJGP-3 

IJGP-6 

IJGP-4 

B2-IJGP-4 

IJGP-8 



0.2125 
0.1316 
0.1755 
0.2171 
0.1259 
0.1291 



0.3710 
0.3406 
0.3846 
0.3708 
0.3161 
0.3287 



0.3088 
0.2275 
0.2741 
0.3674 
0.2211 
0.2831 



Algorithm 


a = 


= 0.5 


a 


= 1 


a = 


= 1.5 


a = 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 


0.2337 


0.0600 


0.4482 


0.0217 


0.3857 


0.0160 


0.3537 


0.0113 


B2-BP 


0.1778 


0.0227 


0.3975 


0.0080 


0.3597 


0.0083 


0.2567 


0.0053 


B3-BP 


0.1358 


0.0173 


0.2622 


0.0080 


0.2799 


0.0090 


0.1640 


0.0027 



0.3232 
0.2044 
0.2967 
0.3912 
0.1360 
0.4102 



CBP-2 

B2-CBP-2 

B3-CBP-2 

CBP-3 

B2-CBP-3 

CBP-4 

B2-CBP-4 



0.1468 
0.0687 
0.0506 
0.1028 
0.0490 
0.0784 
0.0382 



0.2543 
0.1647 
0.1607 

0.5243 
0.3423 
1.0373 
0.6737 



0.3710 
0.2913 
0.1979 
0.3289 
0.2522 
0.2595 
0.1839 



0.1590 
0.1127 
0.1143 
0.3300 
0.2400 
0.6690 
0.4723 



0.3389 
0.3177 
0.2539 
0.2648 
0.3006 
0.2182 
0.2847 



0.1337 
0.0877 
0.0943 
0.2713 
0.1893 
0.5497 
0.3857 



0.3252 
0.2521 
0.1092 
0.2855 
0.2416 
0.2662 
0.1682 



0.1007 
0.0707 
0.0763 
0.2180 
0.1577 
0.4593 
0.3233 



TreeEP 

B2-TreeEP 

B3-TreeEP 



0.0804 
0.0499 
0.0427 



0.2300 
0.1327 
0.1407 



0.2153 
0.1390 
0.0989 



0.2470 
0.1787 
0.2117 



0.1128 
0.1104 
0.0686 



0.1720 
0.1133 
0.2090 



0.0803 
0.0552 
0.0546 



0.1977 
0.1020 
0.1337 



GBP-4 
B2-GBP-4 



0.0085 
0.0005 



21.4370 
16.7007 



0.0500 
0.0033 



30.0877 
26.8110 



0.0501 
0.0022 



25.9627 
28.9640 



0.0340 
0.0012 



24.5683 
32.2330 
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TABLE III 
10 x 10 Grid Graph with Mixed Potentials: 30 Trials 



Algorithm 


a - 
Error 


= 0.5 
Time (s) 


a 


= 1 


a = 


= 1.5 


a = 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 


0.0514 


0.0237 


0.1542 


0.2863 


0.3178 


0.4313 


0.3728 


0.4750 


B2-BP 


0.0337 


0.0057 


0.0955 


0.0310 


0.1862 


0.1653 


0.2422 


0.2320 


B3-BP 


0.0243 


0.0057 


0.0824 


0.0363 


0.1364 


0.1217 


0.1851 


0.1883 


IJGP-3 


0.0431 


- 


0.1362 


- 


0.2719 


- 


0.3588 


- 


B2-IJGP-3 


0.0264 


- 


0.0820 


- 


0.1531 


- 


0.1905 


- 


IJGP-6 


0.0325 


- 


0.1015 


- 


0.2042 


- 


0.2612 


- 


IJGP-4 


0.0434 


- 


0.1330 


- 


0.2639 


- 


0.3449 


- 


B2-IJGP-4 


0.0246 


- 


0.0675 


- 


0.1479 


- 


0.1479 


- 


IJGP-8 


0.0243 


- 


0.0703 


- 


0.1567 


- 


0.1911 


- 


CBP-2 


0.0391 


0.2240 


0.0988 


0.3727 


0.1789 


0.3817 


0.2184 


0.3913 


B2-CBP-2 


0.0240 


0.1190 


0.0692 


0.2107 


0.1107 


0.2583 


0.1572 


0.2653 


B3-CBP-2 


0.0164 


0.1230 


0.0505 


0.1957 


0.1020 


0.2440 


0.1212 


0.2600 


CBP-3 


0.0351 


0.4950 


0.0879 


0.8083 


0.1549 


0.8327 


0.1984 


0.8497 


B2-CBP-3 


0.0208 


0.2660 


0.0642 


0.4753 


0.0988 


0.5790 


0.1325 


0.6020 


CBP-4 


0.0314 


1.0173 


0.0780 


1.6667 


0.1484 


1.7300 


0.1755 


1.7780 


B2-CBP-4 


0.0191 


0.5620 


0.0541 


0.9970 


0.0986 


1.1933 


0.1004 


1.2360 


TreeEP 


0.0124 


0.1190 


0.0350 


0.1650 


0.0610 


0.2177 


0.0864 


0.2743 


B2-TreeEP 


0.0124 


0.0837 


0.0386 


0.1353 


0.0573 


0.1467 


0.0825 


0.2340 


B3-TreeEP 


0.0125 


0.1057 


0.0342 


0.1490 


0.0681 


0.1980 


0.0887 


0.2413 


GBP-4 


0.0009 


10.8840 


0.0054 


17.2403 


0.0091 


22.2760 


0.0139 


23.1837 


B2-GBP-4 


0.0000 


10.8890 


0.0002 


15.1400 


0.0008 


17.8040 


0.0013 


18.6687 



TABLE IV 
15 x 15 Grid Graph with Repulsive Potentials: 20 Trials 



IJGP-3 

B2-IJGP-3 

IJGP-6 

IJGP-4 

B2-IJGP-4 

IJGP-8 



0.2052 
0.1346 

0.1671 
0.2187 
0.1367 
0.1566 



0.4239 
0.2683 

0.3903 
0.4406 
0.3103 
0.4115 



0.4846 
0.3805 
0.5354 
0.5065 
0.3515 
0.5518 



Algorithm 


a - 


= 0.5 


a - 


= 1 


a = 


= 1.5 


a = 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 


0.2187 


0.1550 


0.3977 


0.0870 


0.5307 


0.0760 


0.5737 


0.0575 


B2-BP 


0.1848 


0.0745 


0.3836 


0.0590 


0.4598 


0.0400 


0.4500 


0.0385 


B3-BP 


0.1314 


0.0700 


0.3020 


0.0455 


0.5541 


0.0360 


0.5140 


0.0445 



0.5706 
0.4571 
0.3411 

0.6234 

0.4842 
0.5451 



CBP-2 

B2-CBP-2 

B3-CBP-2 

CBP-3 

B2-CBP-3 

CBP-4 

B2-CBP-4 



0.1774 
0.1335 
0.0887 
0.1578 
0.0946 
0.1382 
0.0864 



0.6895 
0.4550 
0.4520 
1.4245 
0.9670 
2.9145 
1.9710 



0.3609 
0.3161 
0.2657 

0.3234 
0.2964 
0.2904 
0.2713 



0.4940 
0.3390 
0.3200 
1.0085 
0.6650 
1.9985 
1.3210 



0.4802 
0.4749 
0.5948 
0.4503 
0.3852 
0.4150 
0.3778 



0.4440 
0.2705 
0.3115 
0.8665 
0.5590 
1.6810 
1.1395 



0.5789 
0.4517 
0.4910 
0.5113 
0.4319 
0.5364 
0.3966 



0.3435 
0.2425 
0.2515 
0.7035 
0.5080 
1.3610 
1.0225 



TreeEP 

B2-TreeEP 

B3-TreeEP 



0.0954 
0.0888 
0.0694 



0.8500 
0.5265 
0.7530 



0.2831 
0.1583 

0.1600 



0.7995 
0.4550 
0.7835 



0.5025 
0.3566 
0.2584 



0.5895 
0.4135 
0.5290 



0.7151 
0.4293 
0.1375 



0.4345 
0.3530 
0.6025 



GBP-4 
B2-GBP-4 



0.0151 
0.0014 



88.3395 
81.1430 



0.0884 
0.0120 



95.5370 
117.4050 



0.0918 
0.0105 



86.9865 
117.1510 



0.0639 
0.0062 



88.9645 
120.3190 
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Fig. 12. Error as the size of the cluster increases in the 20 x 20 grid graph. The horizontal axis denotes the cluster size in the 
block-graph and vertical axis denotes the mean error. 

TABLE V 
20 x 20 Grid Graph with Repulsive Potentials: 10 Trials 



Algorithm 


cr = 0.5 


a = 1 


CT = 1.5 


(7 = 2.0 


Algorithm 


a = 0.5 


(7= 1 


<T= 1.5 


a = 2.0 


BP 


0.2174 


0.3182 


0.7014 


0.5926 


CBP-2 


0.2028 


0.3222 


0.7034 


0.5550 


B2-BP 


0.1608 


0.3412 


0.6623 


0.5117 


B2-CBP-2 


0.1387 


0.3023 


0.6497 


0.4465 


B6-BP 


0.0874 


0.1320 


0.4904 


0.3102 


B6-CBP-2 


0.0638 


0.0952 


0.5348 


0.2926 


TreeEP 


0.0946 


0.1243 


0.4201 


0.5124 


CBP-3 


0.1893 


0.3042 


0.6634 


0.5569 


B2-TreeEP 


0.0969 


0.2184 


0.2736 


0.5154 


B2-CBP-3 


0.1248 


0.2385 


0.5538 


0.4057 


B6-TreeEP 


0.0624 


0.1330 


0.1868 


0.0574 


B6-CBP-3 


0.0637 


0.0545 


0.4947 


0.2919 


GBP-4 


0.0175 


0.0285 


0.1188 


0.0259 


CBP-4 


0.1806 


0.2450 


0.6571 


0.5409 


B2-GBP-4 


0.0015 


0.0034 


0.0264 


0.0038 


B2-CBP-4 


0.1160 


0.2842 


0.5563 


0.3990 


B3-GBP-4 


0.0003 


0.0005 


0.0021 


0.0017 


B6-CBP-4 


0.0608 


0.0451 


0.4871 


0.3116 



C. Random Regular Graphs 

Tables [VT] and I VII I show results of applying the block-graph framework for inference over graphical 
models defined on random regular graphs with attractive potentials. Table [Vj considers graphs with 50 
nodes and degree 3 and Table I VII I considers graphs with 70 nodes and degree 3. Just like the grid graph 
case, we observe that our generalized framework leads to better marginal estimates. This is shown by 
highlighting the algorithm that leads to minimal mean error for each inference algorithm. We observe 
that both CBP and TreeEP based algorithms perform the best, even when compared to GBP. 



VII. Summary 

We proposed a framework for generalizing inference (computing marginal distributions given the 
joint probability distribution) algorithms over graphical models (see Fig. []]). The key components in 
our framework are (i) constructing a block-tree, a tree-structured graph over non-overlapping clusters, 
and (ii) constructing a block-graph, a graph over non-overlapping clusters. We proposed a linear time 
algorithm for constructing block-trees and showed how large clusters in a block-tree can be split in a 
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Random Regular Graph with p ■■ 



TABLE VI 

50 Nodes, Degree 3, and Attractive potentials: 30 Trials 



CBP-2 

B2-CBP-2 

B3-CBP-2 

CBP-3 

B2-CBP-3 



0.0073 
0.0058 
0.0056 
0.0052 
0.0040 



0.0630 
0.0500 
0.0530 
0.1357 
0.1093 



0.0330 
0.0272 
0.0235 
0.0194 
0.0131 



0.0827 
0.0653 
0.0670 
0.1653 
0.1347 



0.0712 
0.0647 
0.0469 
0.0358 

0.0368 



0.0610 
0.0487 
0.0560 
0.1313 
0.1103 



Algorithm 


a - 
Error 


= 0.5 
Time (s) 


a 


= 1 


a - 


= 1.5 


a = 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 

B2-BP 

B3-BP 


0.0290 
0.0217 
0.0167 


0.0020 
0.0010 
0.0000 


0.1683 
0.1513 
0.1407 


0.0050 
0.0037 
0.0033 


0.1514 
0.1414 
0.1394 


0.0023 
0.0023 
0.0023 


0.2809 
0.2520 

0.2817 


0.0020 
0.0000 
0.0000 



0.2389 
0.1977 
0.1590 
0.1074 
0.0919 



0.0467 
0.0397 
0.0450 
0.1097 
0.0947 



TreeEP 


0.0101 


0.0323 


0.0687 


0.0473 


0.0815 


0.0437 


0.0675 


0.0487 


B2-TreeEP 


0.0100 


0.0320 


0.0845 


0.0553 


0.0878 


0.0543 


0.0728 


0.0457 


B3-TreeEP 


0.0096 


0.0350 


0.0576 


0.0587 


0.0650 


0.0907 


0.0443 


0.0780 


GBP-3 


0.0290 


1.1053 


0.1683 


1.4173 


0.1514 


1.0257 


0.2299 


0.7763 


B2-GBP-3 


0.0217 


0.8870 


0.1513 


1.2520 


0.1414 


0.9837 


0.2280 


0.6937 


GBP-4 


0.0230 


1.0157 


0.1548 


1.4403 


0.1439 


1.1250 


0.2286 


0.7587 



TABLE VH 

Random Regular Graph with p = 70 Nodes, Degree 3, and Attractive potentials: 20 Trials 



Algorithm 


a - 


= 0.5 


a 


= 1 


a = 


= 1.5 


a - 


= 2.0 




Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


Error 


Time (s) 


BP 


0.0172 


0.0070 


0.1313 


0.0200 


0.2144 


0.0100 


0.2410 


0.0100 


B2-BP 


0.0154 


0.0040 


0.1211 


0.0120 


0.2071 


0.0090 


0.2895 


0.0040 


B3-BP 


0.0106 


0.0025 


0.0871 


0.0120 


0.1866 


0.0065 


0.2037 


0.0040 



CBP-2 


0.0069 


0.0990 


0.0397 


0.1425 


0.1036 


0.1185 


0.1833 


0.0965 


B2-CBP-2 


0.0069 


0.0870 


0.0459 


0.1220 


0.0923 


0.1080 


0.2571 


0.0875 


B3-CBP-2 


0.0063 


0.0870 


0.0271 


0.1135 


0.0882 


0.1065 


0.1542 


0.0880 


CBP-3 


0.0057 


0.2120 


0.0247 


0.2840 


0.0811 


0.2465 


0.1547 


0.2050 


B2-CBP-3 


0.0063 


0.1875 


0.0264 


0.2545 


0.0727 


0.2285 


0.0897 


0.1925 



TreeEP 

B2-TreeEP 

B3-TreeEP 



0.0044 

0.0054 
0.0049 



0.0500 
0.0575 
0.0715 



0.0404 
0.0484 
0.0341 



0.0860 
0.1130 
0.1005 



0.1028 
0.1016 
0.0682 



0.0985 
0.0990 
0.1575 



0.0741 
0.0969 
0.0548 



0.1025 
0.1000 
0.1730 



GBP-3 
B2-GBP-3 



0.0150 
0.0127 



1.4935 
1.4910 



0.1081 
0.0939 



3.0275 
2.! 



0.2100 
0.1998 



2.2155 
2.1770 



0.2149 
0.2046 



2.0235 
1.6230 



systematic manner to construct block-graphs that are favorable for inference. Using numerical simulations, 
we showed that our framework for generalized inference in general leads to improved marginal estimates 
for many approximate inference algorithms implemented in the libDAI software package. This suggests 
that the generalized inference framework can be used as a wrapper for improving the performance of 
approximate inference algorithms. All the code and graphical models used in the numerical simulations 



can be downloaded from http://www.ima.umn.edu/~dvats/GeneralizedInference.html Although the focus 
in this paper was on computing marginal estimates, our proposed block-graph based framework can also 
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be used to generalize algorithms for computing the partition function {Z in (fl}) ll52l . ll53l or for the 
problem of MAP inference Il54l-ll57l 

There are several interesting research directions that can be further pursued to improve our generalized 
inference framework. Our algorithm for constructing block-graphs only used the structure of the graph in 
computing the set of non-overlapping clusters. Using the parameters of the graphical model may result 
in improved marginal estimates. Further, it may be of interest to design block-graphs that are specific to 
the inference algorithm of interest. Another interesting research direction is to combine frameworks that 
choose overlapping clusters with the block-graph framework. 
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